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This paper presents a streaming (sequential) protocol for universal entanglement concentration 
at the Shannon bound. Alice and Bob begin with A'' identical (but unknown) two-qubit pure states, 
each containing E ebits of entanglement. They each run a reversible algorithm on their qubits, and 
end up with Y perfect EPR pairs, where Y = A'^.E ± 0{-\/N). Our protocol is streaming, so the 
input systems are fed in one at a time, and perfect EPR pairs start popping out almost immediately. 
It matches the optimal block protocol exactly at each stage, so the average yield after n inputs is 
(y) = nE — 0{\ogn). So, somewhat surprisingly, there is no tradeoff between yield and lag - our 
protocol optimizes both. In contrast, the optimal A''-qubit block protocol achieves the same yield, 
but since no EPR pairs are produced until the entire input block is read, its lag is 0{N). Finally, 
our algorithm runs in 0(logA'^) space, so a lot of entanglement can be efficiently concentrated using 
a very small (e.g., current or near-future technology) quantum processor. Along the way, we find 
an optimal streaming protocol for extracting randomness from classical i.i.d. sources and a more 
space-efficient implementation of the Schur transform. 



Entanglement between two distant parties is an es- 
sential ingredient in quantum communication primitives 
such as teleportation [1] and dense coding [2]. It is fungi- 
ble, and can be transformed with negligible loss between 
different bipartite states, but the standard currency is 
EPR pairs, two-qubit states of the form 

l^+x^ |0^0b) + |U1b) ^ 

where the separated parties "Alice" and "Bob" each pos- 
sess one qubit. Most information processing protocols 
that use entanglement are designed to use perfect EPR 
pairs, so if the parties have some generic entangled state 
Pab, their first order of business is to transform it into 
EPR pairs. This is called entanglement concentration if 
the initial state is pure [5] , and entanglement distillation 
if it is mixed [4 . For pure states, the appropriate measure 
of entanglement is given by the von Neumann entropy of 
the reduced density operator of either subsystem [S] . A 
partially entangled pure state 

^ Vp\OaOb) + ^/i^\Ia1b) (2) 

has entanglement H{p) — —{plogp + (1 — p)log(l —p))- 
This means that if Alice and Bob collect N pairs, and 
N is large, then they can concentrate their entangle- 
ment into approximately NH{p) EPR pairs. Remark- 
ably, this requires no communication; they can do it by 
independently performing local reversible computations 
[3]. However, existing protocols for entanglement concen- 
tration 131 El [7] are block algorithms, that is, Alice and 
Bob must process all N qubits together. This approach 
has two drawbacks: lag and memory. Alice and Bob 
get no EPR pairs until all N input qubits have arrived, 



and they need A'^-qubit quantum computers to store and 
process all the input qubits. The experimental state of 
the art - roughly 10 qubits as of this writing - cannot 
achieve the large block sizes required to approach opti- 
mality. So let us explore what can be achieved with a 
small quantum information processor. 

We could solve the lag and memory problems by break- 
ing the input stream into blocks of length Nq , processing 
them one at a time. But this also introduces error and/or 
inefficiency. Not even a single perfect EPR pair can be 
extracted with certainty from a block of finite length Nq. 
If Alice and Bob are willing to settle for slightly distorted 
EPR pairs, then they can do much better. They can ex- 
tract NqH{p) — 0{^/Nq) pairs, each of which has fidelity 
1 — e~'-'(^'') with a perfect EPR pair. However, this pro- 
tocol cannot approach the Shannon bound for fixed A'o; 
the 0{^/Nq) term represents wasted entanglement. A 
better approach is to let each block yield a variable num- 
ber of EPR pairs. This achieves an average yield of up 
to NoH{p) — O(logAo) pairs per block, which still falls 
short of the Shannon bound for finite Nq. 

In general, there might be a tension between two goals: 
achieving the Shannon bound for large A'^, and getting 
out perfect EPR pairs as quickly as possible for small 
and intermediate A^. In fact, these goals can both be 
achieved at the same time. In this paper, we present a 
sequential (a.k.a. instantaneous, streaming, or online) 
protocol that reads in partially entangled pairs one at a 
time and outputs perfect EPR pairs as they're generated. 

Theorem 1. Let Alice and Bob share many copies of 
a bipartite pure state [ip) with entanglement E. There 
exists an entanglement concentration protocol that Alice 
and Bob run independently, in parallel, and sequentially 
on their sequences, which has the following properties. 

1. After both parties have processed N qubits, the ex- 
pected yield is NE — 0(log N) perfect EPR pairs - 
i.e., the optimal rate is achieved. 
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2. This holds for every N , so the lag time is OilogN). 
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3. The algorithm works for all input states. 

4-. It uses only 0{logN) qubits of memory. 

This protocol is fully reversible and coherent, involv- 
ing no measurements. As a result, the points in The- 
orem [T] are not independent. E.g., since the algorithm 
is reversible, it does not destroy any entanglement - and 
since it runs in 0(log N) memory, at least NE—0{log N) 
bits of entanglement must have been emitted at any time 
N. 

The rest of this paper constitutes the proof of Theo- 
rem [l] It is organized as follows. In Section |l] we discuss 
data compression and show that quantum variable length 
compression codes are not suitable for entanglement con- 
centration. In section^ we turn to classical randomness 
extraction, and discuss Elias's optimal block extractor. 
In section we construct a streaming version of Elias's 
randomness extraction protocol, and show that it can be 
used for entanglement concentration when the Schmidt 
basis is known. In Section |IV| we build a fully universal 
protocol by combining our extraction protocol with the 
quantum Schur transform. 



I. DATA COMPRESSION: WHY IT DOESN'T 
WORK 

There is a deep link between entanglement concentra- 
tion and quantum data compression. Given the state in 
Eq. |2] Alice and Bob each describe their nth input qubit 
by a density matrix 



Pa = Pb = P 



(1-P)|1>(1|, 



(3) 



with entropy H(p) < 1, all of which is due to entangle- 
ment with the other party. Concentrating the entangle- 
ment contained in N input qubits into M EPR pairs, for 
which the parties' reduced states are 
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Pa^ Pb = 



+ |1>(1|): 



(4) 



means compressing the entropy of N copies of p into M 
maximally mixed states. Done reversibly, this is data 
compression. Indeed, the original entanglement concen- 
tration protocol of Bennett et. al. [3] is essentially a 
block compression algorithm, followed by a measurement 
of Hamming weight. 

Seeking a sequential protocol for entanglement con- 
centration, we might therefore turn to sequential data 
compression protocols. Some of the oldest and best- 
known methods of classical data compression are of this 
type. Variable-length protocols such as Huffman coding 
and arithmetic coding replace each input symbol with a 
codeword whose length depends on the symbol's prob- 
ability. Quantum algorithms for Huffman coding and 
arithmetic coding exist [8] |9] (the Chuang-Modha algo- 
rithm for arithmetic coding is actually a block protocol, 
but there's no fundamental obstacle to sequential quan- 
tum arithmetic coding). However, the total length of the 



transmission is entangled with the messages being sent. 
So, although the encoder can compress sequentially, the 
decoder must wait until the end of the transmission to 
start decoding. 

For this reason, variable-length compression does not 
accomplish entanglement concentration. Even under op- 
timal circumstances (i.e., where a Huffman code with 
block-length 1 achieves the Shannon bound for compres- 
sion), Alice and Bob's output qubits are not perfect EPR 
pairs. Although each party's nth output qubit is indeed 
maximally mixed (as it should be, if it's to be half of 
an EPR pair), it is correlated with subsequent output 
qubits, e.g. the (n + l)th qubit. This correlation deco- 
heres the EPR pairs. 

To see a simple example of this, consider a 
4-dimensional input Hilbert space spanned by 
{\a) , 1 6) , |c) , \d)}, and a source that emits 



1 
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\bb) 
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1 



cc 



\dd) (5) 



PA ^ Pb 



\a){a\ + -\b){b\ 



l\c^(c\ + l\d){d\ (6) 



This distribution, with entropy H = 1.75 bits, can be 
compressed perfectly into qubits by the following Huff- 
man code: 



\a) 
\b) 

\d) 



|10) 

1110) 

1111) 



(7) 



If Alice and Bob each apply this protocol to their input 
streams, the first partially-entangled pair is transformed 
to 



l^>out = 



;^|0).|0), + i 
+ ^1110)^1110)^ + 



|10)^|10)j 



(8) 



^liiiMiii).. 



Consider the reduced state of Alice's and Bob's first out- 
put bits, obtained by tracing out the 2nd and 3rd bits 
(the string is implicitly zero-padded, so unspecified bits 
are in |0)). In the basis {|00) , |01) , |10) , |11)}, it is 



Pout 








(9) 



This state's fidelity with an EPR state is only w 

0.85. Furthermore (and this is important!), since this 
protocol is sequential, it will never go back and change 
the first bit. Nothing that Alice and Bob do to subse- 
quent output bits can enhance the entanglement of their 
first pair; it will always be defective. 

This failure refiects an inherent property of variable- 
length codes: each output symbol is correlated with the 
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length of the entire output (see (5] for the first mention 
of this issue, but in a different context). The correlation 
is indirect, for both the individual output symbols and 
their overall length are determined by the input symbols. 
For a rather extreme example, recall that the Huffman 
code given above maps \a) |0) and \d) If the 

output string contains high proportion of |0) qubits, then 
the input string must have contained a lot of \a) symbols, 
and therefore the output string will be relatively short. A 
high proportion of |1) qubits, on the other hand, means 
that the input contained a lot of \d) symbols, and so the 
output is relatively long. This correlation is enough to 
decohere each individual output EPR pair. 



II. EXTRACTING RANDOMNESS 

This failed experiment in using standard data com- 
pression demonstrates a key point: in sequential concen- 
tration, Alice's nth output qubit must not be correlated 
with anything except Bob's nth output qubit (and vice- 
versa), from the moment it is written down. No sub- 
sequent actions by the concentrator can fix a defective 
output. The first step in a protocol that emits a stream 
of perfect EPR pairs is to generate just one perfect EPR 
pair. This cannot be done deterministically with a finite 
number of input qubits, but it can be done conditionally 
- i.e., if a pair is generated, then it is perfect. 

If Alice and Bob know their shared state, then extract- 
ing a perfect EPR pair is closely related to a classical 
problem: "How do we extract a perfect independent ran- 
dom bit from a stream of biased, i.i.d., random bits?" It 
is critical that each extracted bit be independent of ev- 
erything, including other random bits and the processor's 
memory. 



A. Von Neumann's protocol 

Von Neumann addressed this problem in 1951 [TU]. He 
proposed sampling the biased bits two at a time. The 
odd-parity sequences "01" and "10" have equal probabil- 
ity, so if the first two bits have odd parity, Von Neumann 
reports the first bit. If we draw two bits with even parity 
("00" or "11"), we discard them and draw another pair. 

Each time a pair is drawn, the Von Neumann scheme 
emits a random bit with probability 2poPi , and fails with 
probability 1 — 2poPi • The number of input bit pairs re- 
quired to get a single random bit is exponentially dis- 
tributed. 



Pr(n) = 2poPi(l-2poPi)' 



(10) 



and the expected waiting time for the first random bit is 

1 



Since the protocol is completely Markovian, the rate at 
which randomness is extracted is 



dA^rbits 

R = — T- — = popi- 
an 



(12) 



This is quite a bit less than the theoretical upper bound, 
^max = H(pq), because Von Neumann's protocol wastes 
a lot of entropy. However, it is a sequential protocol, and 
it can be used for entanglement concentration. Alice and 
Bob each run the following algorithm: 

1. Draw two qubits qi, q2 from the input. 

2. Perform a CNOT (in the Schmidt basis) from qi 
52. This stores the parity, qi ffi 92, in q2- 

3. Conditional on 52 = swap qi with the "out- 
put register", and halt; otherwise draw two new 
qubits and repeat. 

Thus, if Alice and Bob share multiple copies of the en- 
tangled state 



\^)^Vp^\0a0b) + Vp^\1a1b), (13) 
the first two copies are tranformed as follows: 

IV'V') = Po|00a)|00b)+pi|1U)|11b) 

+Vp^(|01a) |01b) + \10a) \10b)) 
=> {po\OaOb)+Pi\IaIb))\OaOb) 

+ VP0Pii\0A0B) + \lAlB))\lAlB). (14) 

Conditional on Alice and Bob's second qubits each being 
in state |1), their first qubits now form a perfect EPR 
pair, \^^) ■ Otherwise, their joint state is given by 



Po 



/fail 



VI - 2poPi 



IOaOb) 



Pi 



VI - 2poPi 



\IaI 



A^B 



(15) 

and they each read another two qubits and repeat. Note 
that there is substantial entanglement left in |V')fair In 
Von Neumann's protocol, this entanglement is wasted, 
and we will get a better protocol by recycling it. 

The protocol given above continues to draw pairs un- 
til it succeeds, at which point it deposits an EPR pair 
into Alice and Bob's first qubits and halts. Running this 
coherently and in parallel on 2A'^ copies of \ip) gives 

(Af-l 
^(1-2^0^1)^^/^ 

(|OAOB)|^)fail)^'|UlB)|V'^)^ 
+ (l-2poPl)^/'(|V')fail|0A0B»^^. (16) 



V «i(Af-fc-l) 



(n) 



PoPi 



(11) 



The amplitude for not halting decreases exponentially 
with N, so for moderately large N we can be nearly cer- 
tain that an EPR pair has been deposited. 

This quantum Von Neumann protocol uses an indeter- 
minate and unbounded number (2k + 2) of input pairs 
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to produce a single perfect EPR pair. If we want a 
perfect EPR pair with certainty, then A^input must be 
unbounded. A finite-sized block of partially entangled 
states does not generally contain even a single perfect 
EPR pair. Fortunately, k is exponentially distributed, so 
we can get an extraordinarily good EPR pair by termi- 
nating the algorithm at relatively small k. 

The algorithm can be iterated, without any modifi- 
cation, to extract a stream of EPR pairs. This is "on- 
demand" mode: the user requests exactly 1 (or n) EPR 
pairs, and the protocol reads as many input pairs as are 
needed. If we wait for a near-perfect EPR pair, then 
a lot of time is wasted. The algorithm probably (i.e., 
with large amplitude) halts at small k, and yet achiev- 
ing near certainty mandates waiting for longer (but low- 
amplitude) computational paths to terminate. 

Alternatively, we could replace the output register with 
an output tape, and replace the "halt" instruction with 
"push qi onto the tape and shift it by one qubit." Now, 
the algorithm never terminates unless it runs out of in- 
puts. As soon as it produces one EPR pair, it starts 
working on the next. In this fully streaming mode, the 
length of the output tape is always indeterminate, but k 
(the number of bits read so far) can be well-defined (e.g., 
if the algorithm terminates) . 

The protocols we will design in subsequent sections 
can be run in either mode. We will typically focus on the 
fully streaming mode, where the output tape's length is 
indeterminate, because it is compatible with a bounded 
input tape. In the quantum Von Neumann protocol, this 
mode is relatively unproblematic. To get an EPR pair, 
the user pops one off the end of the tape (without learning 
how long the output tape is). A problem occurs only if 
the user finds no available pairs, which implies that the 
output tape is empty. 

This is not true for other protocols, which recycle en- 
tanglement in order to achieve much higher efficiency. 
This recycling requires a coherent superposition of many 
output tape lengths. Disrupting this superposition (by 
issuing a failed request for an EPR pair that is, with 
some amplitude, not available) will reduce efficiency. So 
in these protocols, the first few squares of the output 
tape must be regarded as a sort of incubator - a re- 
gion where EPR pairs are almost certainly available, but 
should nonetheless not be used. Running the protocol 
in on-demand mode avoids this problem entirely (but re- 
quires an unbounded stream of inputs). 



protocol that achieves the entropic bound as ^ cx). In 
fact, our protocol is a sequential implementation of the 
optimal block protocol, and extracts at most 2 ebits less 
than it. 

Quite a few papers have followed up on Von Neumann's 
work, generalizing and improving it. Early work focused 
on the extraction of a single random bit, and sought to 
minimize the expected number of input bits. HoefFd- 
ing and Simons |llj represented algorithms as random 
walks on the lattice of non-negative integer points in the 
plane, {no,ni}. Stout and Warren [T2] represented algo- 
rithms more generally as walks on binary trees. Other 
authors (notably Samuelson [T3] and Elias [2]) showed 
how to extract random bits from fcth-order Markov pro- 
cesses, a particular kind of non-i.i.d. source. A flood of 
more recent work (beginning with Trevisan's seminal pa- 
per in 1998) has generalized the notion of extractors to 
extremely general non-i.i.d. sources; this level of general- 
ization, however, is not relevant to our task. 

Each of these single-bit extraction protocols can be 
repeated (like Von Neumann's) to yield a stream of ran- 
dom bits (or EPR pairs, in the context of entanglement 
concentration) . Such protocols never approach the Shan- 
non bound, since any residual entropy/entanglement in 
the used input bits is wasted (Hoeffding and Simons [H] 
proved an upper bound of i? = 1/3 on the rate, and 
demonstrated an algorithm that achieves R « 0.323 as 
p — > ^). An efficient protocol has to somehow recycle 
this entropy. 

Elias seems to have been both the first and the last 
to suggest an asymptotically efficient block protocol [T3] . 
Elias 's protocol, which is essentially unimprovable, uses 
the fact that every A^-bit string containing T "1" bits has 
probability 



Pr{N, T) 



(17) 



The set of all such strings is a type class, containing ex- 
actly (^) strings with the same probability. If we draw 
an A^-bit string, then conditional on the type being T, 

the index a e 1 . . . (^) of the particular string drawn 

is a uniformly random variable with (^) possible values. 
If (^) happens to equal 2^, then by writing this index 
down in binary, we immediately get L perfectly random 
bits. Otherwise, we use the binary representation of (y) 
to expand it as a sum of powers of 2, 

A^' 
T 



(18) 



B. Achieving the Shannon bound: EHas's protocol and divide the interval I 



Von Neumann's protocol wastes at least 75% of the 
entropy in the input bits, and the corresponding entan- 
glement concentration protocol wastes an equal amount 
of entanglement. Block protocols, in contrast, can ex- 
tract randomness or entanglement with asymptotically 
perfect efficiency - i.e., at a rate given by the entropy of 
the source, as A^ — > oo. We will now develop a sequential 



= [I-- 

= [2^1 



,2^1] 
+ 1. 



1.. 



into bins 



' k-l 
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If the index a lies in the interval Xi^_ , we output = 
a — (^'^Zi 2^*°^ as a i^^^-bit string. 

Theorem 2. On average, Elias's protocol extracts at 
least NH{p) — log2(A^ + 1) — 2 bits of entropy from N 
input bits, so as N ^ oo, it achieves the Shannon bound. 

Proof. To prove this, we let s be an A^-bit string, and ob- 
serve that s is equivalent to a pair of indices (T, a), where 
T is its type and a the index of s within type T. Further- 
more, a is equivalent to a pair (L,ai), where 2^ is the 
bin of (y) containing a, and is its L-bit index within 
2l- Since we output the entirety of a^, the extracted 
entropy is H{aL\T, L). Now, since s ^ {T,L,aL), we 
apply the chain rule for conditional entropy. 



H{Y\X) = H{Y,X)-HiX), 



(19) 



to obtain 



H{aL\T,L) - H{T,L,aL)- H{T,L) 

= H{s)~[H{T) + H{L\T)]. 

The input distribution has exactly NH{p) bits of en- 
tropy, so H{s) = NH{p). Since there are only TV -I- 1 
types, H{T) < log2(iV + 1) (actually, T is binomially 
distributed, so H{T) = \ \og^(N) + 0(1)). To calculate 
H{L\T), recall that 



so L takes values {ifc} with probability 



(20) 



(21) 



and H(L\T) is just the entropy of this distribution. Now, 
we can place an upper bound of 2 bits on H(L\T) by the 
following argument: 

Let n be an integer, with a binary expansion 



(22) 



where Lq > Li > ...L^- This defines a probability 
distribution over L, 



Pr(Lfe) = Pr(i = Lk) 



2Lk 



(23) 



Now, since Lq > Li > . . . Lk, then it's easy to see that 
Pr(Lo) > i and that Pr(Li) > ^(1 - Pr(Lo)), and in 
general that Pr(Lfe) > ^Pt^L < Lk). Thus Pr(L) ma- 
jorizes the infinite exponential distribution given by 



Pr(0 = 2-^ : l=l...oo 



(24) 



whose entropy is exactly 2 bits. Since entropy is convex, 
H{L\T) < H{1) = 2. □ 



Like Von Neumann's protocol, Elias's protocol, if per- 
formed coherently, gives an entanglement concentration 
protocol. The original block concentration protocol [5] 
uses a decomposition very similar to Elias's, while sub- 
sequent work by Kaye and Mosca uses exactly this de- 
composition [6J. Whereas Von Neumann's protocol yields 
either or 1 EPR pairs, and can be repeated conditional 
on failure to yield exactly 1 pair, Elias's protocol yields 
a variable, binomially distributed number of EPR pairs. 

Theorem 3. Elias \s protocol, performed coherently on 
N copies of the bipartite state has an average yield 
of at least NH{p) - log2(Af -|- 1) - 2 EPR pairs, where 



Proof. Alice and Bob begin with the state 



/p\OaOb) + V^\^aIb)) 



0i)N 



E 

se{oa}" 



sasb) 



where the probability Pr(s) of a string s £ {0,1}^ con- 
taining T{s) "l"s is 



Pr(s) = p 



N-T{s) 



(1-p) 



T(s) 



Each type class Tt, labeled by its Hamming weight T, 
defines a type subspace spanned by \sasb) for all s in 
the type class. We can rewrite the joint state as a sum 
over type subspaces. 



where Pr(T) = (y)p^-^(l - pY . So if Alice and Bob 
both measure T, then they both obtain the same value 
T, which is distributed according to Pr(T). Conditional 
on this measurement, they have 



Eser^ \sasb) 



which is a maximally entangled state of dimension (^) . 

They now divide the strings of type T into bins Cl of size 
2^. The specific binning is entirely arbitrary, as long as 
Alice and Bob use the same one. Alice and Bob's state 



IS 




2^ Esec, \^aSb) 



As with T above, Alice and Bob can measure L and be 
assured of getting the same answer L. Conditional on L, 
they have 



IV')f ,L = 



E 



sec, 



\sasb) 



\OaOi 



\1a1 



Ai^B) 



V2 
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so Alice and Bob now share L EPR pairs (although they 
are still distributed over N physical qubits). The joint 
distribution Pr(r, L) is identical by inspection to the one 
in Theorem |2] so expected yield is identical. □ 

III. STREAMING EXTRACTION 

Elias's protocol is a block algorithm; it operates on all 
N qubits at once. There have been relatively few at- 
tempts to design efhcient sequential extractors. Several 
authors (including Elias) have observed that single-shot 
protocols such as the Von Neumann or the HoefFding- 
Simons protocol can be repeated indefinitely, but that 
they are far from optimal. Elias suggested a quasi- 
sequential application of his protocol: apply it to the first 
2 input bits, then the next 4, then the next 6, etc, etc. 
This is both strictly suboptimal for any N (though it does 
approach the Shannon bound as A'^ ^ oo) , and memory- 
intensive as A^ — > oo (since the blocklength grows as 
^/N)■ Peres jT5) showed how to iterate Von Neumann's 
protocol, recycling the entropy in bits that have already 
been used, but his protocol is not actually sequential. 
Visweswariah et al. [16] suggested the use of variable- 
length source codes as extractors, but Hayashi [17J sub- 
sequently pointed out that the output bits are not quite 
randomly distributed (see also our discussion above of the 
problems this raises for entanglement concentration). 

Our first goal is to construct a sequential extractor 
that achieves the Shannon bound (in fact, a streaming 
implementation of Elias's protocol). That is, we wish to 
construct an algorithm that reads bits one at a time, per- 
forming some processing and outputting random bits as 
they are produced, before reading the next bit. Further, 
when A^ bits have been read, for any given A^, our pro- 
tocol extracts the same amount of randomness as Elias's 
block protocol. We will first assume we have applied 
Elias's protocol to a block of size A^ — 1, and investigate 
how to extract the extra randomness produced by adding 
one more input bit. 

A. Serializing Elias's protocol 

We have seen that Elias's protocol represents an A^-bit 
input string s as (T, L, a^), where T describes the type, L 
represents the bin within the type, and the L-bit 
index within T^. The index ul consists of L perfectly 
random bits, and forms the output of the protocol. A 
particular implementation of the protocol provides a par- 
ticular mapping from the strings within a given type to 
the index pair (L, q;l). In order to construct a streaming 
implementation, we first note that a mapping for A^-bit 
strings may be constructed in a convenient way from the 
mapping for (A^ — l)-bit strings. Suppose that the A^ — 1 
input bits (61...67V-1) have already been transformed 
into (A — 1,To,Loj«Lo) by Ehas's protocol, and the Lq 
random bits represented by ulq have been emitted. We 



want to add one more bit 67V, updating the transforma- 
tion as (A^ — 1, To, L07 CkLo) {N,T, L,aL)- Since a 
streaming protocol acts on strings of different lengths, 
we also keep track of A^, the number of bits read so far. 
We will now describe this procedure in more detail. 

Recall that each of the (^) strings of type (A^, T) can 
be obtained either by adding a "0" to one of the strings 
of type (A^— 1, T), or by adding a "1" to one of the strings 
of type (A^ - 1,T - 1). The strings of type (A - l,r) 
have been sorted into bins such that bin L, if present, 
contains 2^ strings, and no two bins have the same size. 
Similarly for the strings of type (A^ — 1,^ — 1). When 
we find ourselves in a bin L, that means we have already 
outputted L random bits. Except for the value of those 
random bits, we treat all strings in the bin identically. 
We don't want any two bins to be the same size, because 
in that case, we could combine the two bins into a single 
bin of twice the size, allowing us to output an additional 
random bit. When we add an extra input bit and find 
ourselves now in type ( A^, T) , we wish to see if we can 
merge any bins, thus producing additional random out- 
put bits. 

The sizes of the types satisfy a recursion rule: 

Upon reading a new bit we update A^ and T to correspond 
to the new number of bits read and the new type. We 
also wish to use the mapping into bins and indices {L,ai) 
for (A^ — l)-bit strings to define one for A^-bit strings. 
Denoting 

^ ^ fc 

(^;) = (28) 

we obtain 

^2^"=^2^^-H^2^''=. (29) 

i j k 

This is simply binary addition, and the rules of binary 
addition also tell us how to update the bins L and indices 
aL. If both the types (A^ - l,r - 1) and [N - l,r) 
have a bin of size 2^, we can merge them, outputting 
a new random bit. This gives us a new "carry bin" of 
size 2^'*"^, corresponding to the carry bit. Perhaps we 
can merge this bin as well with another bin, producing 
another random bit and a new carry bin, and so on. 

To construct a streaming implementation of Elias' pro- 
tocol, we simply read bits one at a time, performing the 
above processing at each step. It is easily verified that 
at A^ = 2 the above performs von Neumann's protocol, 
while for A^ > 2, by induction, following these rules gives 
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an implementation of Elias' protocol for each N. The 
rules for what to do upon reading the 7V-th bit are de- 
fined by the triplet (N — 1,T, L), along with the bit just 
read. In particular, they do not depend on the index 
that identifies a particular string. So, since the L bits of 
ttL are not needed to process subsequent bits, they can 
be ejected as soon as they are produced. 

For every input string that causes the nth output bit 
to be "0", there is a matching string that (a) produces 
exactly the same memory state, (b) produces exactly the 
same output bits except for the nth one, and (c) yields a 
"1" for the nth output bit. This guarantees that the out- 
put bits are unbiased and uncorrelated with the memory. 
The memory state is completely specified by the three 
integers (N,T,L), so the processor's memory need grow 
only as 0(log(iV)). 

B. Implementation 

This implementation may be conveniently represented 
by the lattice shown (up to = 5) below. Each possible 
input string corresponds to a different path through the 
lattice, and red dots indicate the fusion of two paths into 
a single node by outputting an unbiased bit. 




The nodes are labeled by three integers: 

• N, the number of input bits read so far, 

• T, the Hamming weight of the input string, 

• L, the number of random bits output so far. 

Note that this lattice is simply the lattice corresponding 
to Pascal's triangle, that is with each node representing 
a different type class, but with the nodes (A'^, T) subdi- 
vided into {{N,T, L)} for each value of L in the binary 
expansion of (^) . Since {N, T, L) represents a collection 
of 2^ strings with the same probability, L will represent 
the number of random bits output so far. The procedure 
in the previous section tells us how to traverse the lattice. 
We can formalize the lattice traversal with the following 
rules. 



Protocol 1. This protocol runs on a machine with three 
integer memory registers labeled N, T, and L. Define 
(^)^ to be the Lth bit of {^) . 

1 WHILE ( input stream not empty ) DO 

2 { 

3 Read a bit b from the input stream. 

4 Update ^ 1 and T^T + b. 

5 IF( (-)^ = or (/7^J^ = 1 ) 

6 { 

7 output b and set L ^ L + 1 . 

8 WHILEC r,-\^(?ll), ) 

9 output (^-f and set L ^ L + l. 

10 } 

11 } 

Discussion: When stated concisely, the protocol is a 
bit cryptic, so here is an explanation of how (and why) 
it works. First, note that the protocol as given runs in 
"fully streaming" mode - i.e., it continues to read and 
write bits indefinitely. To make it run in "on-demand" 
mode, we change each instance of "output x" to "output 
X and then pause." It's not sufficient to pause before 
line 3, because (significantly) there is not a 1:1 corre- 
spondence between reading and outputting bits. 

The basic idea here is to read bits until the algorithm 
arrives at an internal state (A', T, L) that could have been 
reached via two different paths with equal probability. 
One path comes from (A^ — 1, T — 1, L) by reading b = 1, 
while the other comes from (A^— 1, T, L) by reading b — 0. 
Since b identifies the path, and the two paths are equally 
probable, b is perfectly random. So the algorithm spits 
it out. 

This simple picture gets complicated because of car- 
rying. The nodes are in 1:1 correspondence with bits of 
binomial coefficients. The existence of two paths coming 
from {N - 1,T - 1,L) and (A^ - 1, T, L) means that the 
Lth bits of C^Zi) E^nd {'^^'^) are both 1. Adding them 
produces a carry bit in column L + 1. Fusing the corre- 
sponding paths produces a "carry path" corresponding to 
node (A', T,L + 1). If that node could have been reached 
in another way (from either or both of (A^— 1, T— 1, L+1) 
or (A^ — l.T.L + 1)), then we need to fuse some more 
paths. 

The algorithm begins by reading a bit and updating 
its internal state. Line 5 checks to see whether the re- 
sulting internal state could have been reached in at least 
two ways. If not, then no perfectly random bit is avail- 
able, and it reads another bit. How is this check per- 
formed? Each node can be reached via one, two, or three 
paths. {N,T,L) has exactly one path leading into it if 
(j,)^ = 1 (meaning there's either one or three paths in), 
and (ji^Y+b)^ ~ ^ (meaning there's only one non-carry 
path, and thus no more than two paths in total). So if 
either of these is false, then there are two or three paths 
leading in. 
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This could actually happen in three difFerent ways. 
There could be: 

1. Two non-carry paths, 

2. One non-carry path and one carry path, 

3. Three paths. 

By design, the algorithm outputs b in all three sub-cases 
(Line 7). It also updates L — > L + 1, then proceeds to 
deal with the resulting carry path into {N, T,L + 1). 

There is some freedom in how to deal with carry paths, 
but the rule embodied by Line 7 eliminates all of it. Re- 
call that output bits have to be uniformly random. If a 
node has two non-carry paths (and no carry path) leading 
in, then we're in good shape - those paths have identical 
probability, but difFerent values of b, so the output bit 
is random. If there is only one non-carry path, and it 
entailed reading in 6 = &, then the corresponding carry 
path must output 1 — 6. Finally, if there are three paths, 
then only two of them can fuse. By outputting 6, we 
are choosing to fuse the two non-carry paths - so the 
corresponding carry path must not output anything. 

Lines 8-9 ensure that the carry path is managed cor- 
rectly. When the algorithm finds itself in {N, T, L) via a 
carry path, there are two cases: 

1. If there are either or 2 non-carry paths into 
{N, T, L), then there's no path to fuse with, so the 
algorithm should output nothing and read another 
bit. This is true if the Lth bits of {^~'^) and 

are the same (both zero means non-carry paths, 
while both 1 means 2 non-carry paths). 

2. If those bits are different, then there is exactly 1 
non-carry path into {N,T,L). If that path came 
from (iV— 1, T, L), then its last input bit would have 
been 6 = 0. To fuse with that path, our algorithm 
should output a 1. Otherwise, the non-carry path 
came from ( — 1 , T — 1 , L) , its last input bit would 
have been b ~ 1, and the algorithm should output a 
0. In either case, (^~"^)^ gives the correct output. 
Outputting a bit yields another carry path, so the 
WHILE statement ensures that we loop around to 
line 8 and deal with it in turn. 

We are going to use this algorithm as an entangle- 
ment concentration protocol, so it has to be completely 
reversible. In the description above, inputs and outputs 
are asynchronous - and the algorithm generally has to 
read bits at a higher rate than it can output EPR pairs. 
If these bits are physical systems (e.g., qubits), where are 
they going? 

To clarify this, we assume that the machine has access 
to three I/O bitstreams or "tapes". The input tape is 
read-only, the output tape is write-only, and the purity 
tape is a read/ write stack that functions as a reservoir of 
clean "0" bits. Now the protocol is explicitly reversible: 
of N input bits, n will be pushed onto the output tape. 



and N — n will be pushed onto the purity tape. However, 
upon reading in a bit b, the protocol may pop one or more 
bits off the purity tape, write random bits onto them, and 
push them onto the output tape (line 80). On the other 
hand, it may also erase b (i.e., reversibly set it to "0") 
and push it onto the purity tape (line 40). Lines 50 and 
70 do not require any action on the purity tape. 

We can summarize this construction as a theorem, 
whose proof is the preceding analysis: 

Theorem 4. Protocol^ applied to a series of N bits, 
implements Elias 's protocol for optimal randomness ex- 
traction. 



C. Performance 

The algorithm described above is a sequential protocol 
for extracting perfectly random bits. But how well does 
it work? 

We begin by noting that our algorithm is sequential, 
but not instantaneous. A truly instantaneous protocol 
(like Huffman coding) is Markovian. Its action on a given 
input symbol does not depend on previous symbols, so 
it requires no memory from one symbol to the next. If 
the output is modeled as a tape, the algorithm needs to 
"remember" where it is on the tape, but an instantaneous 
protocol makes no additional use of this information. 

Our algorithm requires a memory register whose size 
grows as logiV. However, any protocol that emits 
uncorrelated asymptotically perfectly random bits and 
achieves the Shannon bound must have a memory that 
grows with N. Of the NH{p) bits of entropy associated 
with the first N input bits, ~ logiV bits are associated 
with the Hamming weight, and cannot yet be distilled 
into perfectly random bits. This entropy must be either: 

1. written down on the output tape, 

2. discarded, or 

3. kept in memory until (with the addition of subse- 
quent bits) it becomes distillable. 

The first solution ensures that some output bits are not 
perfectly random. The second solution prohibits achiev- 
ing the Shannon bound. The third solution requires 
a memory whose size grows as O(logA^) (and a non- 
Markovian protocol). 

Our protocol is reversible, so it discards no entropy at 
all. It therefore not only achieves the Shannon bound, 
but does so very tightly - the total amount of random- 
ness extracted from N input bits is NH{p) — O(logiV), 
which follows immediately from reversibility and the 
bounded size of the memory. Furthermore, it also effi- 
ciently extracts purity, which in certain circumstances 
may be more useful than randomness. We note that 
the Schulman-Vazirani cooling algorithm [18 is also con- 
structed from classical randomness extraction protocols. 



9 



but is not streaming as it makes use of Peres' itera- 
tive von Neumann protocol. Again, because the memory 
for our algorithm is so small, we know that on average 
N{1 — H{p)) pure bits will be ejected. Note that all of 
these figures are average values - in any given experi- 
ment, the yield of random and pure bits will fluctuate by 

o{Vn). 

D. Extracting entanglement 

This reversible protocol for extracting random bits can 
be adapted rather easily for entanglement concentration. 
The only extra necessity is that Alice and Bob must 
implement the protocol not just reversibly, but also co- 
herently (i.e., on a quantum information processor |24j ) . 
The data registers must be quantum registers that can 
support superposition states without decohering, and the 
logic gates must preserve quantum superposition. More- 
over, each "if-then" statement in the algorithm must be 
implemented as a controlled operation, e.g. a quantum 
CNOT gate, rather than involving a measurement and 
conditioning on that measurement. 

Suppose that a source produces pairs of systems one 
at a time in the joint (Alice-Bob) state 

IV-) =a|0A0B) + /3|UlB). 

The reduced state of a single qubit on either Alice or 
Bob's side is 

p=|anO)(0|-t-|/3ni)(l|. 

Suppose Alice and Bob each run our protocol coherently 
on their streams of qubits. After N input bits have been 
read, our streaming protocol has implemented Elias's 
block protocol on them. Therefore, by Theorem [3] it 
outputs perfect EPR pairs when performed coherently. 
However, it is also instructive to consider why each out- 
put pair, considered individually, is maximally entangled. 

Locally, Alice and Bob will each see output streams of 
maximally mixed qubits, 

Pout-^|0)(0| + ^|1)(1|. 

To show that all the entropy comes from entanglement 
~ i.e., Alice's nth output qubit forms an EPR pair with 
Bob's nth qubit - let us consider just the first output 
qubit. 

1. Alice's and Bob's input bits are perfectly corre- 
lated, and since the computational paths of their 
algorithms depend only on these input bits, their 
first output bit is perfectly correlated as well. That 
is, if we were to measure Alice's first qubit and find 
it in the |0) state, then we would surely find the 
same result if we measured Bob's first qubit. 

2. The algorithms that Alice and Bob run are com- 
pletely reversible. They involve no measurements 



and no outside randomness. Furthermore, their 
joint input states are pure and thus carry no en- 
tropy at all. Thus, given that their first output 
bits arc perfectly correlated, these bits must form 
an EPR pair unless they are decohered by some 
other system. Such a system would have to be cor- 
related with Alice or Bob's first output qubit, and 
it would have to be either another output qubit or 
a qubit still stored in memory. 

3. The algorithm can be configured (as discussed 
above) to pause after outputting exactly one bit. 
Thus, we can consider the first output qubit when 
there are no other output qubits, and so we can 
rule out the possibility that the first EPR pair is 
decohered by another output qubit. 

4. The memory registers of both Alice and Bob's pro- 
cessors are uncorrelated with the state of the first 
output bit. This follows quite simply from the way 
we built the protocol: each path that outputs |0) 
is balanced with another path of the same length 
and the same probability that outputs Further- 
more, by outputting a qubit, the protocol explicitly 
forgets which path it traversed. Thus, while Alice 
and Bob's processors are each in a complicated su- 
perposition of different computational basis states 
(and are in fact highly entangled with each other), 
neither is even slightly correlated with the value of 
the first output bit. 

This shows that Alice and Bob's first output qubits form 
an EPR pair. This EPR pair is utterly uncorrelated 
with anything else, particularly the memories of Alice 
and Bob's processors. It follows that when Alice and 
Bob distill out their second qubits, they too are perfectly 
correlated with each other, and uncorrelated with any- 
thing else - and therefore form an EPR pair, as do all 
subsequent pairs. 

We conclude this section by pointing out a limitation 
of the algorithm presented so far. It's basically a clas- 
sical algorithm, adapted to run on a quantum computer 
in the computational basis. Thus, it assumes and relies 
upon Alice and Bob's input states being diagonal in the 
computational basis. Of course, if the input states were 
instead 

\^) = a\++)+l3\—), 

then we could modify the algorithm very simply - just 
perform an SU(2) rotation on each input qubit to change 
the Schmidt basis. However, we must know the Schmidt 
basis of the input states. Our algorithm (as presented so 
far) is a streaming implementation of the protocol orig- 
inally introduced by Bennett et al in 1996 [3J. In Sec- 
tion [TV] however, we show how to lift this requirement, 
constructing an algorithm for truly universal streaming 
entanglement concentration, which doesn't require any 
advance knowledge of the joint state (except a promise 
that it's pure). 
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E. Computational complexity 

Let us now consider the resources necessary to imple- 
ment our protocol. One of the main advantages of a 
streaming protocol over a block protocol is reduced mem- 
ory usage. The streaming protocol doesn't need to store 
the entire input block of N qubits! Instead, our protocol 
requires three integer registers for N, T, and L. Each 
register must be fully quantum (i.e., capable of storing 
arbitrary superpositions of integers) , but only log(A^) bits 
in size, since both T and L are less than or equal to N. 

Our algorithm also requires some temporary storage 
to calculate its transitions between memory states. Most 
of this calculation is trivial and can be done using 0(1) 
qubits. The one major exception is calculating (^)^- 
Each iteration of the algorithm has to calculate the Lth 
bit of two binomial coefficients. This is nontrivial. In 
fact, at first glance it looks almost impossible, since T is 
typically 0{N), and 



r!(A^-T)! 



is 0{N) bits in size. Calculating it involves 0{N) mul- 
tiplications and divisions of integers with O(logA^) bits 
each, and storing the result requires 0{N) bits of mem- 
ory. 

Fortunately, we only need to compute a single bit of 
(^). This removes any need to store a number with 0{N) 
bits. We can then take either of two routes (depending 
on which is more convenient) to run the algorithm in 
0(log N) qubits of memory. 

1. We can run the entire algorithm - including com- 
puting bits of binomial coefficients - on a quantum 
processor, with no classical assistance at all. This 
turns out to be possible because computing the Lth 
bit of (^) is in the complexity class LOGSPACE. 
Thus, temporary memory requirements can be held 
to 0(log N). However, this makes the algorithm de- 
sign much more complicated, and may slow it down 
substantially (since we trade time for space). 

2. We can precompute the binomial coefficients with 
a classical processor. If we have poly(iV) classical 
memory, then this can be done relatively quickly, 
and the results used to implement the quantum 
protocol. The trick here is that the classical com- 
puter cannot know the values of N, T, and i - if 
it did, it would decohere the computation. So the 
classical computer has to calculate all of the 0{N^) 
possible binomial coefficients. Though clumsy, this 
approach is probably more practical for moderate 
N, and minimizes the amount of quantum compu- 
tation necessary. 

Computing binomial coefficients is in LOGSPACE be- 
cause division and iterated multiplication are both in 
LOGSPACE [19J. The quotient of two A^-bit numbers. 



or the product of N TV-bit numbers, can be computed in 
0(log A^) space. TV! is the product of A^ numbers whose 
size is log A'' bits, so we can compute it in LOGSPACE. 
Three such computations yield A^!, T!, and {N — T)l, and 
computing the binomial coefficient involves two divisions. 

This may seem paradoxical - how can an A^-bit number 
be computed in O(logA^) bits of space? We are allowed 
a machine with 0(log A^) read- write memory, plus an un- 
bounded read-only tape containing the problem specifi- 
cation (e.g., the A^-bit numbers to be divided, or the A^ 
numbers to be multiplied) , and an unbounded write-only 
tape on which the answer will be written out. This model 
is very adaptable to our problem. To calculate the Lth 



bit of 



, we chain three such machines together. 



i 



Enumerator 



Multiplier 



Divider 



The first has a logA^-sized input tape containing A^ 
and T. It constructs (and passes to the second machine) 
two long lists of integers: {1 . . . A''} and {1 ... T, 1 ... A' — 
T}. This is easy to do with 0{logN) memory. The 
second machine multiplies together the numbers in these 
lists, and passes the results to the third machine as two 
integers. A"! and T\{N — T)l. The third machine divides 
these numbers, calculating only the Lth bit of the result, 
and outputs (y)^- 

Communication between the machines is accomplished 
via queries. Instead of reading a long read-only tape, the 
second and third machines tell their predecessor which 
bit of a "virtual tape" they need, and the predecessor 
computes it on the fly. This trades time for space, avoid- 
ing the need for an 0(A'^log A^) memory tape, at the cost 
of extra time complexity. 

We do not know the time complexity of this approach, 
but it seems unlikely to be low. Ideally, a streaming pro- 
tocol would process each input symbol in 0(1) time, pro- 
cessing all A^ symbols in 0{N) time. This is manifestly 
impossible for an adaptive protocol, which has to main- 
tain and process some record of what it's read so far. The 
size of that record grows as 0{logN), which suggests a 
lower bound of U{Npo\y\ogN) for processing A^ symbols 
(since processing the A'^th symbol involves a polynomial- 
sized computation on a memory of size log A^) . 

We do not know whether this can be achieved, but the 
straightforward approach given above certainly doesn't. 
Processing the A^th symbol involves computing (^)^, 
and doing this in with minimal space takes at least 
O(A^polylogA^) time. This is itself only a lower bound. 
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Because the various stages in the computation query each 
other, we can only say that the time complexity of com- 
puting (^)^ this way is 0(poly(iV)). 

Reducing the per-symbol cost from 0(poly(A^)) to 
O (poly log A^) would make our protocol much more useful 
in practice. To do so, we need to avoid computing all 
the bits of iV! to get just one bit of (^). However, a 
similar problem - computing iV! mod P (where P > N) 
- is thought to be hard. An 0(polylog(P)) algorithm 
would yield an efficient algorithm for integer factoring. 
So finding fast ways to exactly compute bits of binomial 
coefficients, while theoretically interesting, is probably 
not the best way to go about this. A more promising 
approach is to approximate (^) to fixed (or 0(logN)) 
precision, e.g. with Stirling's approximation. Because L 
is exponentially distributed, L will almost always be very 
close to i,„ax = Uog2 (t)J' ^° Computing only the 
most-significant K bits of (^) should induce an error of 
at most . Formally, this precludes actually achieving 
the Shannon bound - but in practice, such tiny (and con- 
trollable) deviations are insignificant. A similar approach 
is almost always used in arithmetic coding, where the use 
of finite precision reduces computational complexity at 
the price of a tiny loss in compression efficiency. 

A more practical approach is to offload as much com- 
putation as possible onto a classical computer. While not 
necessarily a good long-term strategy, this is a promis- 
ing solution as long as quantum memory is limited and 
precious. To process the A^th bit this way, we use the 
classical computer to loop over every value of T and L. 
It computes (^)^, uses this to design a unitary circuit, 
and then applies that unitary conditional on \T, L){T, L\. 
Since there are at most N possible values of T and 
L, and the unitary acts on a register of size O(logiV), 
the time required to process a single input symbol is 
0(A^^polylog(N)). This is undeniably ugly, but provides 
a simple constructive approach to implementing our al- 
gorithm in a bounded amount of time. 



IV. A FULLY QUANTUM PROTOCOL: THE 
STREAMING SCHUR TRANSFORM 



The algorithm that we presented in the previous sec- 
tion requires Alice and Bob to know something about the 
state IV') describing their systems. Specifically, they need 
to know the Schmidt basis, which we've written (without 
loss of generality) as {|0) , |1)}, where 



liP) =a\OAOB)+P\lAll 



(30) 



It is this knowledge of the Schmidt basis that reduces 
the problem to classical randomness extraction. Note, 
however, that Alice and Bob do not need to know a and 
/3. Our protocol is classically universal (i.e., independent 
of the probabilities |ap, |/?P), but not quantumly univer- 
sal. In this section, we fix this problem and generate a 
completely universal streaming protocol, by incorporat- 



ing the quantum Schur transform. The resulting algo- 
rithm is a streaming implementation of Matsumoto and 
Hayashi's optimal block concentration protocol [7]. 



A. Quantum types, representation theory, and 
Schur- Weyl duaUty 

The algorithm that we developed in previous sections 
performs a particular transformation on strings. It di- 
vides the A^-bit input string into a permutation-invariant 



type, and an index a G 



0.. 



(?) 



1 



into the type 



class. (All the complicated business with L is neces- 
sary only because we want to efficiently convert a into 
random bits). This transformation is used frequently in 
classical information theory, where it gives rise to the 
method of types. Its usefulness arises because we are 
dealing with a permutation- invariant distribution over in- 
put strings, so separating out the permutation-invariant 
part is handy. Furthermore, the index a isn't just 
permutation-dependent; it's uniformly random when the 
input is permutation-invariant. This is because the per- 
mutation group Sn acts transitively on type classes - 
i.e., given any two strings s, s' in a type class, there is a 
permutation that transforms s — *■ s'. 

In the absence of a preferred basis, we can't apply the 
classical method of types directly. Instead, our algorithm 
must deal with arbitrary vectors in the Hilbert space of 
A'^-qubit quantum strings, Ti, = {0,2)^^ ■ Fortunately, 
there is an analogous method of quantum types [20 , and 
a corresponding transformation on quantum strings that 
divides them into a permutation-invariant "type" and an 
"index" into that type class. This transformation is the 
Schur transform ,21j , and after introducing it in this sec- 
tion, we'll show how to combine it with our randomness- 
distillation protocol. 

We can apply permutations to A^ qubits, just like A^ 
classical bits. Each of the A^! permutations in the sym- 
metric group Sn is represented by a 2^ x 2^ unitary 
operator acting on Ti. These operators form a represen- 
tation of Spf. This representation is reducible, meaning 
that Ti. can be divided into a direct sum of subspaces 
Tik, each closed under the action of every permutation 
in Sn- These subspaces are irreducible representation 
spaces, a.k.a "irreps", of Sn, and they are the quantum 
equivalent of type classes. 

The analogy between classical and quantum types is 
not as straightforward as one might think from the pre- 
vious paragraph. To see this, let's consider the simplest 
possible example: two qubits. Their Hilbert space is C*, 
and the permutation group 5*2 = {U, 7r(i2)} has two ele- 
ments. Since U acts trivially on all states, the irreps of S2 
are the eigenspaces of 7r(i2). Its eigenvalues are {-1-1, —1}, 
and its action on defines two invariant subspaces: a 
1-dimensional antisymmetric subspace (the "singlet"). 



'^antisymmetric — Span 



101) -|io) 
^/2 



(31) 
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and a 3-dimensional symmetric subspace (the "triplet") 
Asymmetric = Span f (|00) , MlM , . (32) 



The singlet is an irreducible representation space. The 
triplet, however, is not irreducible ~ in fact, any proper 
subspace of the triplet is itself invariant, since both ele- 
ments of 5*2 act trivially on it. If we try to reduce the 
triplet to a direct sum of irreps, we face an embarrass- 
ment of choices - there is no preferred decomposition into 
1-dimensional subspaces. 

Just for contrast, consider the classical case of two bits. 
There are three type classes: {00}, {01,10} and {11}. 
Each is invariant under permutations, and "irreducible" 
(meaning that it cannot be further subdivided). Both 
00 and 11 are symmetric strings, but they are distin- 
guished from one another by their Hamming weight, and 
by the existence (in classical theory) of a preferred set 
of symbols, {0, 1}. If we chose {|0) , |1)} as a preferred 
basis for qubits, we could use it to divide the triplet into 

irreducible subspaces spanned by ||00) , y^^"^ ; 
However, the breaking of unitary symmetry is arbitrary 
and unsatisfying. 

That very unitary symmetry suggests a much more ele- 
gant solution. The triplet and singlet are each invariant, 
not only under permutations, but also under collective 
unitary rotations. That is, we apply the same U € SU{2) 
to each qubit. Collective rotations of the form U ®U (or 
jj<SN general) are a representation of the group SU{2), 
and the singlet and triplet (being invariant under these 
rotations) are representation spaces. Furthermore, they 
are both irreducible representation spaces, for they have 
no proper rotation-invariant subspaces. 

This is the simplest example of Schur-Weyl duality. 
Schur-Weyl duality is the statement that, given a Hilbert 
space n®^: 

1. The action of the symmetric group Sm commutes 
with the action of the collective rotation group 
SU{d), and 

2. Tif"^ decomposes into a direct sum of subspaces 
Tix, each of which is the direct product of an irrep 
of SU{d) with an irrep of Sn'- 



(33) 



For two qubits, there are two terms in the decomposition, 
which we'll denote A = 0, 1, so: 



(34) 



Both representations of 5*2 are trivial, so Vq and Vi are 
both 1-dimensional. The triplet {Ui) is a 3-dimensional 
irrep of SU{2), while the singlet (Uq) is 1-dimensional. 
We need to add a third qubit to obtain a nontrivial sym- 
metric group representation: the action of on three 
qubits has two irreps, one of which is 2-dimensional. 



In this decomposition of iV-qubit strings, the P\ spaces 
correspond to type classes, while the irrep label A and 
the U\ spaces together correspond to the classical type. 
This is a little confusing at first; why do we need two 
variables to describe the "type" of a quantum string? 
It makes more sense if we look at classical types in a 
slightly different way. First, we note that whereas the 
reversible transformations on a single qudit are unitaries 
in SU{d), the corresponding transformations on a clas- 
sical c?-ary system are elements of Sd ~ i.e., permuta- 
tions of the d symbols. So, we can divide a classical 
type T — {ni . . . n^} into two parts: (1) a sorted list of 
frequencies T = {ni > n2 > ■ ■ ■ rid}, and (2) a permu- 
tation in Sd identifying which of the d symbols appears 
1st, 2nd, etc. in the sorted list. This view of classical 
types turns out to be exactly analogous to the Schur-Weyl 
decomposition. The irrep labels A correspond precisely 
to nonincreasing partitions {ni > n2 > ■ ■ ■ rid} (where 
Tlik'^^k = -^)- These "frequencies" relate to the eigen- 
values of p in exactly the same way that the type of an 
A^-bit string of i.i.d. symbols relates to the source proba- 
bilities - if p has eigenvalues {pk}, then as N gets large, 
measuring the irrep label of p®^ gives {rik} ~ {Npk} 
with high probability. The ti\ spaces carry information 
about the diagonal basis of p. 



B. Applying quantum types to concentration 

If Alice and Bob share N partially-entangled qubit 
pairs in state \^), they describe their respective systems 
by and p%^ , where pA and pB are partial traces of 
Because p®^ is permutation-invariant, it can be 
decomposed according to Eq. |33] as 



u 



dim(7'A)^ 



(35) 



a state that is maximally mixed over each type class V\. 
This follows from Schur's Lemma; if a matrix p is invari- 
ant under p Trpir^ for all tt in a representation G, then 
p is a direct sum of scalar matrices on the irreps of G. 
The conditional states pxpx on the various SU{2) irreps 
are determined by p, and aren't especially relevant to this 
discussion. 

This is the quantum counterpart of the classical ob- 
servation that i.i.d. distributions of strings are uniformly 
distributed within type classes. For the purposes of en- 
tanglement concentration, the states on Alice's Vx sub- 
spaces are not just uniformly random. They are max- 
imally entangled with their counterparts on Bob's side. 
So if Alice and Bob each measure A, they get identical 
results A, and are left with a state 



PA^ PB ^ Px' 
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dim(7'0- 



(36) 



Now, recall that they started with a pure state I^A) , 
and performed a projective measurement. This means 
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that their post-measurement state is pure - and thus 
their maximally mixed reduced states correspond to a 
maximally entangled pure state 



1 



dimCP^;) 

E l-?'^-?'^) I ■ (37) 



If they want perfect EPR pairs, they may as well dis- 
card the Uy^ subsystem, which contains 0(log N) bits of 
non-maximal entanglement. They are left with a maxi- 
mally entangled state over the entire subspace. This 
can be converted into EPR pairs by partitioning into 
subspaces of dimension 2^ and measuring L, exactly as 
explained in the proof of Theorem [3] 



C. The streaming Schur transform 

Matsumoto and Hayashi showed how to use the Schur- 
Weyl decomposition (Eq 



33) and its properties to 
achieve optimal universal compression |22| and entan- 
glement concentration [7 . These are non-constructive 
information-theoretic results, like Shannon's random- 
coding proof of channel capacity, rather than practi- 
cal implementations. However, Bacon et. al. recently 
demonstrated a quantum algorithm to perform the quan- 
tum Schur transform, which points the way to imple- 
menting these protocols efficiently on a quantum com- 
puter ^21j. Our goal in this section is to use the Bacon 
et. al. algorithm as a building block for a streaming 
concentration/compression protocol. 

The Schur transform transforms an A^-qubit Hilbert 
space ^ into the direct -sum Hilbert space given in 
Eq. [33] 



This is just a change of basis - but, then, every unitary 
transformation is "just" a change of basis. The Schur 
transform takes as input a single iV-qubit register, and 
outputs three quantum registers of different sizes. We'll 
call these registers T, U, and P, and in the following list 
we describe each register and give an example of what its 
state would be for an input string p*^^. 

1. The T register holds the irrep label A. It is spanned 
by a basis {|A) : A = . . . }. Measuring the 
T register provides the best possible estimate of 
p's spectrum - i.e., whether the individual qubits 
of the input state are consistently aligned along a 
particular direction in Ti.2. 

2. The U register holds the state of the SU{2) irrep 

The dimension of Ux depends on A, so U has 
to be big enough to hold the largest L{\, which is 
(A^+l)-dimensional. [/ is spanned by a basis { | m) : 
m = . . . iV}. Measuring the U register provides 



the best possible estimate of the eigenbasis of p - 
which, for qubits, is equivalent to the direction of its 
Bloch vector. Unlike the T register, the U register 
does not have an unique basis in which we would 
measure it to extract information. Measuring the 
{|m)} basis yields the best estimate of the input 
string's Hamming weight in the {|0) , |1)} basis, but 
if we wanted to know its Hamming weight in the 
11+) , |— )} basis, a different measurement would be 
optimal. 

3. The P register holds the state of the Sn irrep. As 
with U, this register must be large enough that 
we can embed any of the V\ spaces into it. In 



fact, it must be at least 



-dimensional, because 



we're mapping Tif^" into T (giU P, yet both T 
and U are 0(A'^)-dimensional. In the Bacon et. al. 
implementation, the P register comprises exactly 
N qubits, denoted {pi,p2, ■ • -Pn}- When we Schur- 
transform p = p^^, measurements on this register 
yield random results. 

The key ingredient in the Schur transform is the Clebsch- 
Gordan transform. It takes as its input the T and U 
registers, along with the nth qubit s„, and outputs up- 
dated T and U registers along with the nth qubit of the 
P register, p„. 



\T) + 



• — R-a (^T,[7' 



— \U') 

— ip> 



The full Schur transform then consists of initializ- 
ing the T and U registers, then sequentially apply- 
ing Clebsch-Gordan transforms to each of the N input 
qubits: 



l&i> 

\l>2) 



— y |Pi> 

UcG — \/H \ — w IP2) 

|63> ■•• 




■ ■ ■ — Ufc 

\bl^) A I — ^ 



• \vn) 
■\T) 
'\U) 



Just a brief glance at the circuit above shows that this 
implementation of the Schur transform is appropriate for 
a streaming protocol. It addresses the input qubits one at 
a time, and never reuses an earlier qubit. The only prob- 
lem is that the P register, holding the Sn irrep, is not 
in the right form. Actually, this is a fairly serious prob- 
lem for any application to concentration or compression, 
because the P register comprises N qubits - no matter 
what the input is. For each input qubit s„, exactly one 
Pn gets emitted, so the entropy of the input qubits is uni- 
formly distributed across the N {pn} qubits, rather than 
being compressed. 
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Compressing the P register requires a peek at the rep- 
resentation theory of iSjv. As we mentioned above, the 
irreps of are labeled by an index A, whose values 
are in 1:1 correspondence with nonincreasing sequences 
of at most d integers, {ni > n2 > ■ ■ ■ > rid} where 
^i^rik = N. These sequences are usually depicted by 
Young diagrams, arrays of N boxes in at most d rows, 
with nfc boxes in their fcth row. Here is the Young dia- 
gram for an irrep of S^: 

J I I I L ni ^ 4 
rT(P n2 = 2 

Because the diagram has only 2 rows, it labels irreps 
of S'g and SU{2) which appear in the decomposition of 
Ti.®^ . Diagrams with more than 2 rows are not relevant to 
qubits; they label valid representations of ^jv, but not of 
SU{2). This particular diagram corresponds (roughly) to 
the class of strings with 4 qubits aligned along a common 
axis and 2 qubits aligned against that axis. 

Now, when the Schur transform circuit addresses the 
Nth input qubit, N — 1 qubits have already been trans- 
formed. The state of the T register is therefore a su- 
perposition or mixture of states corresponding to Young 
diagrams with N — 1 boxes (i.e., |A = {ni,N — 1 — rii}). 
Adding another qubit corresponds to adding another box 
to the Young diagram. We can add it to the first row, or 
add it to the second row if the second row isn't already 
as long as the first row. As we read more qubits in, the 
T register (following this rule) traverses Young's lattice: 




This looks quite a bit like Pascal's triangle, and it 
plays exactly the same role. Different types correspond 
to different locations in the lattice (plus, in the quantum 
case, an SU{2) register that's not shown), while different 
strings within a type class correspond to distinct paths. 
In Young's lattice, each node is labeled by a Young dia- 
gram, which labels an irrep of Sn (i.e., a quantum type 
class). Each of the paths to a given node corresponds 
to a distinct state within that class. Thus, by counting 
the paths to a node, we obtain the dimension of each 
representation space: 




The representation spaces don't have a unique basis, 
but the path-counting procedure above suggests a con- 
venient basis known as Young's orthogonal basis. We 
simply assign to each path p a basis state \p). Paths to a 
node in the TVth row of Young's lattice consist of N steps, 
and each step is either to the right (meaning we add a 
box to the first row of the Young diagram) or to the left 
(meaning we add a box to the second row) . Clearly, any 
path can be denoted by a sequence of N symbols from 
the set {L = left, R = right}, e.g. p = RRLLR . . ., and 
thus we can encode all such paths into N bits - or, since 
we are dealing with quantum strings, and can traverse 
Young's lattice in superposition, into N qubits. 

This encoding is not efficiently compressed, nor is it 
appropriate for entanglement concentration. Since the 
P register contains complete information about the path 
taken through Young's lattice, it also contains informa- 
tion about the end-point of the path - i.e., about the 
irrep label stored in T. Moreover, for strings in high- 
weight irreps, most of the steps will be to the right, so 
most of the pk bits will be "R" . We need to compress the 
P register in order to extract EPR pairs from it. 

Our algorithm is almost perfectly suited to this. In 
fact, it can be applied directly with only two changes: 

1. Our algorithm traversed the lattice of Pascal's tri- 
angle, whose nodes' sizes are binomial coefficients. 
We need to adapt it to traverse Young's lattice, 
whose nodes have different sizes. The dimension of 
an irrep Y of Sn is given by the hook length for- 
mula: 

(a) Draw the Young diagram. 

(b) To each of the A'^ boxes x in the Young di- 
agram, assign a "hook length" h{x), which is 
the sum of (a) the number of boxes to the right 
of x; (b) the number of boxes directly below 
x; and (c) 1 for x itself. 

(c) The size of Y is given by 
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A short calculation for the Young diagram with A^— 
T boxes in the first row and T in the second row 
gives 



Size(Y) 



N\N-2T+l 
TJ N~T+1 ' 



(39) 



So the size of a quantum type class is very nearly 
equal to the size of the corresponding classical type 
class, with a simple rational function giving the dis- 
crepancy. Every instance of "calculate a binomial 
coefficient" in our original algorithm gets replaced 
by "calculate the corresponding irrep dimension" . 

2. Instead of performing operations conditional on the 
classical Hamming weight T, we condition our oper- 
ations on the irrep label T. Since all the operations 
in our algorithm are necessarily coherent anyway, 
this change brings no significant changes. 

This defines what we will call the quantum streaming 
Elias transform. A single step of the transform can be 
represented as a unitary operation Ue- 



\P)- 



IT) 



T) 



Ue acts on two registers - an SU (2)-invariant qubit 
\p) and the bin size \L) - conditional on a third, the 
irrep label |T). It also has access to two variable-length 
tapes. One is output-only, and holds EPR pair halves. 
The other is bidirectional, and holds pure |0) qubits. The 
\p) qubit always goes out onto one tape or the other - 
but sometimes, Ue also pops one or more qubits off the 
purity tape, fills them with entanglement from the |T) 
and |L) registers, and pushes them out the EPR tape. 

We can use this protocol, together with the Schur 
transform, to make a completely universal extraction pro- 
tocol. 

Protocol 2. 




|6iv) ■ 




■ |i> 

■ \T) 

■ \U) 



1. Each new qubit is Clebsch-Gordan trans- 
formed, yielding updated \T) and \U) registers, and 
an SU (2) -invariant qubit |p„). 

2. \T) and |p„) are fed into a quantum Elias trans- 
form, along with \L) . 

3. The physical input qubit \bn) , suitably transformed 
into either or 1 EPR pair-half, emerges immedi- 
ately on one of the two tapes. 

A few remarks are in order here. Our algorithm is 
basically an interleaving of the Schur transform with the 
quantum Elias transform. These two components are 
coupled only by \T); the \U) and \L) registers are only 
used by the Schur and Elias components (respectively). 
We have described the protocol's fully streaming mode, 
where N can be assumed classical. On-demand mode 
requires another quantum register for |A'^). The integer 
registers (jT) , \L) , \U)) must grow with N . If quantum 
memory is at a premium, pure qubits may be scavenged 
from the end of the purity tape. However, the purity tape 
is also used by Ue as a source of fresh qubits, whenever 
it reads a single \p) bit and outputs more than one EPR 
pair-half. 



V. DISCUSSION 

This is the first adaptive (streaming and universal) pro- 
tocol for entanglement concentration. Because it runs in 
very little space, it can be implemented using current (or 
near-future) technology. This opens the door for experi- 
mental implementations of a variety of information the- 
oretic protocols. We have already used the ideas in this 
paper to design sequential protocols for optimal quan- 
tum data compression and state discrimination, which 
use 0(log A) or even 0(1) memory. 

Although this protocol can be used for quantum data 
compression (details will be given elsewhere), good data 
compression algorithms can fail at entanglement concen- 
tration. There are many ways to encode compressed data 
which do not meet the (more stringent) structure require- 
ments for concentrated EPR pairs. Reversible entangle- 
ment concentration, on the other hand, seems to neces- 
sarily yield data compression |25j . Entanglement con- 
centration seems to have stricter requirements than com- 
pression. Given the role of compression in information 
theory, this suggests that more insights can be gained by 
applying the stricter requirements of concentration. 

Our protocol bolts together two components. It seems 
possible to regard either component of the algorithm as 
"trivial". From one perspective, the Schur transform 
does all the heavy quantum lifting; our algorithm just 
compresses the P register. However, consider the classi- 
cal version of this protocol. A classical Schur transform 
does the following: 

1. It counts the number of "1" bits in the input to 
obtain the Hamming weight T. 
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2. It separates the type into two registers: (a) a single- 
bit "dictionary" U that identifies whether T or N — 
T is bigger (i.e., whether or 1 appears more often); 
and (b) a "sorted type" max(T, N -T). 

3. It strips out the dictionary information (/7), by 
replacing the kth input bit Sk with a dictionary- 
invariant bit Pk — Sk ® U. This ensures that the 
{pk} are invariant under any "collective rotation" 
of the entire string. 

Most of this is computationally trivial. The most signifi- 
cant step is adding up the Hamming weight of the input. 
So the classical equivalent of the Schur transform is ba- 
sically sequential addition - which we took for granted 
in our implementation of the streaming Elias protocol! 
Stripping the dictionary register U out of the {sfc}, which 
seems optional (and, in fact, rather arbitrary) in the clas- 
sical variant, is a necessary part of quantum sequential 
addition; the no-cloning theorem prohibits us from copy- 
ing information, so in order to calculate and store it in U, 
we must remove all traces of it from the other registers. 

The previous paragraph should not be taken to im- 
ply that the Schur transform itself is in any way trivial. 
Rather, we are suggesting that the Schur transform can 



be seen as the fully quantum analogue of sequential ad- 
dition. This isn't actually all that surprising, since the 
main application of Clebsch-Gordan coefficients is in the 
addition of angular momentum. Nonetheless, there is a 
subtle distinction worth noting: whereas Clebsch-Gordan 
coefficients are used to do classical calculations about 
quantum systems, the Schur transform is a fully quantum 
physical operation. A similar distinction divides classical 
simulation of a quantum system from quantum simula- 
tion of a quantum system. 

Finally, our construction has implications for quantum 
learning. Adaptive classical protocols are closely tied 
to machine learning. Our protocol demonstrates how a 
quantum computer can "learn" a quantum source, and 
adapt its strategy, without ever making a measurement 
or collapsing the input state. 
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